Software Vault: The Gold Collection

home *** CD-ROM | disk | FTP | other *** search

/ Software Vault: The Gold Collection / Software Vault - The Gold Collection (American Databankers) (1993).ISO / cdr47 / pctuto.zip / DISK4.EXE / lha / CHAP23.DOC < prev next >

Wrap

Text File | 1990-07-31 | 28KB | 657 lines

253 CHAPTER 23 - XLAT The 800 pound gorilla in the computer field is, of course, IBM. It can go its own way and other companies have to adjust to keep themselves in line with what IBM is doing. You have been using ASCII characters since the first time you used BASIC (or whatever your first high-level language was). Every character has a unique number which represents it. character ASCII encoding A 65d a 97d ? 63d 0 48d IBM has its own encoding for mainframe computers. It is called EBCDIC (pronounced ebb'-sih-dick).{1} It is a spinoff of the coding on punch cards. You remember punch cards? This coding is entirely different from ASCII. Here are some examples. character ASCII code EBCDIC code a 97d 129d ? 63d 111d 0 48d 240d H 72d 200d I 73d 201d J 74d 209d K 75d 210d You can see that there is no relationship between the two encodings. Also, notice that while the alphabet is a continuous section of ASCII coding, there are breaks in the EBCDIC code (I=201, J=209). All PCs use ASCII, so if we want to transfer text from a PC to an IBM mainframe computer, we need to change ASCII -> EBCDIC going to the mainframe and change EBCDIC -> ASCII coming from the mainframe. This is the responsibility of the communications program that runs the modem, so you will never have to do it yourself. Intel has provided an instruction to help the communications program do this translation. It is called XLAT. In order to use XLAT, you need a translation table. This is a 256 byte array where each element of the array contains the result you want. Looking at the data above: ____________________ 1. Which stands for Extended Binary Coded Decimal Interchange Code. ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson The PC Assembler Tutor 254 ______________________ CHARACTER ASCII TO EBCDIC TABLE EBCDIC TO ASCII TABLE a array1 [97] = 129 array2 [129] = 97 ? array1 [63] = 111 array2 [111] = 63 0 array1 [48] = 240 array2 [240] = 48 H array1 [72] = 200 array2 [200] = 72 I array1 [73] = 201 array2 [201] = 73 J array1 [74] = 209 array2 [209] = 74 K array1 [75] = 210 array2 [210] = 75 We have two different tables here. Array1 takes the ASCII encoding and gives back the EBCDIC encoding. Array2 takes the EBCDIC encoding and gives back the ASCII encoding. For each character, the appropriate table gives the correct translation from one encoding to another. All we need now is the translation instruction. Put the address of the translation table in BX. This table should be in the DS segment, but DS may be overriden: mov bx, offset ascii_to_ebcdic_table Put the character you want translated in al: mov al, character translate: xlat To translate a 20 byte string of ASCII data into EBCDIC, you might have the following code: ;---------- mov di, offset ebcdic_string mov ax, seg ebcdic_string mov es, ax mov si, offset ascii_string mov bx, offset ascii_to_ebcdic_table mov cx, 20 ; translate 20 bytes cld ; clear DF (increment) translation_loop: lodsb ; ascii to al xlat ; translate stosb ; al to ebcdic loop translation_loop ; ---------- Since this is ASCII to EBCDIC, if AL contained 63 before XLAT, then after XLAT AL would contain 111. If AL contained 73 before XLAT, then after XLAT it would contain 201. If AL contained 97 before XLAT, after XLAT it would contain 129. If we wanted to go the other direction we would have to make the EBCDIC string the source string, make the ASCII string the Chapter 23 - Xlat 255 _________________ destination string, and use the other table: mov bx, offset ebcdic_to_ascii_table The rest of the code would be the same. Since this is done by the communications program, we won't concern ourselves with ASCII <-> EBCDIC any more, but we will use XLAT in two slightly different ways. First, let's categorize characters. Some things are Whitespace (that is, tabs, newlines, spaces, form feeds, etc.) Some characters are octal, decimal, punctuation, hex, etc. There is a pre-existing table called translation_table in the subdirectory XTRAFILE. Its pathname is \xtrafile\transtbl.obj. It has all 256 ascii characters coded in the following way: WHITESPACE EQU 80h ; 1000 0000 PUNCTUATION EQU 40h ; 0100 0000 ALPHABETIC EQU 20h ; 0010 0000 OCTAL EQU 10h ; 0001 0000 DECIMAL EQU 08h ; 0000 1000 HEX EQU 04h ; 0000 0100 BOX_CHAR EQU 02h ; 0000 0010 GREEK_CHAR EQU 01h ; 0000 0001 If the character is whitespace, then the leftmost bit is set. If it is a greek character (ascii 224 - 239 on the PC) then the rightmost bit is set. If it is more than one thing, then the appropriate bits are set. For instance, '6' is octal, decimal and hex, so it's encoding is: '6' 0001 1100 'a' is both alphabetic and hex, so it's encoding is: 'a' 0010 0100 The following program inputs a character, and finds out whether it is punctuation, a letter, etc. If it is none of the eight things, then the program prints that nothing was found. It is the same block of code over and over, so you might want to do only part, or you might want to cut it out with a word processor and insert it in the template file (don't forget to delete the page headers and page numbers). ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE EXTRN translation_table:BYTE ;\xtrafile\transtbl.obj whitespace_banner db "It is whitespace." , 0 punctuation_banner db "It is punctuation." , 0 alphabet_banner db "It is alphabetic." , 0 octal_banner db "It is octal." , 0 decimal_banner db "It is decimal." , 0 The PC Assembler Tutor 256 ______________________ hex_banner db "It is hex." , 0 drawing_banner db "It is a box drawing character." , 0 greek_banner db "It is a Greek character." , 0 nothing_banner db "No match was found." , 0 dirty_flag db ? ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE WHITESPACE EQU 80h PUNCTUATION EQU 40h ALPHABETIC EQU 20h OCTAL EQU 10h DECIMAL EQU 08h HEX EQU 04h BOX_CHAR EQU 02h GREEK_CHAR EQU 01h ; set up the xlat table mov ax, seg translation_table mov es, ax mov bx, offset translation_table outer_loop: mov dirty_flag, 0 ; marker for success call get_ascii_byte ; input a byte to al xlat es:[bx] ; do the translation test al, WHITESPACE jz punct_check push ax ; save translation in al mov ax, offset whitespace_banner call print_string pop ax mov dirty_flag, 1 ; set the dirty flag punct_check: test al, PUNCTUATION jz alpha_check push ax ; save translation in al mov ax, offset punctuation_banner call print_string pop ax mov dirty_flag, 1 ; set the dirty flag alpha_check: test al, ALPHABETIC jz octal_check push ax ; save translation in al mov ax, offset alphabet_banner call print_string pop ax mov dirty_flag, 1 ; set the dirty flag octal_check: test al, OCTAL jz decimal_check Chapter 23 - Xlat 257 _________________ push ax ; save translation in al mov ax, offset octal_banner call print_string pop ax mov dirty_flag, 1 ; set the dirty flag decimal_check: test al, DECIMAL jz hex_check push ax ; save translation in al mov ax, offset decimal_banner call print_string pop ax mov dirty_flag, 1 ; set the dirty flag hex_check: test al, HEX jz drawing_check push ax ; save translation in al mov ax, offset hex_banner call print_string pop ax mov dirty_flag, 1 ; set the dirty flag drawing_check: test al, BOX_CHAR jz greek_check push ax ; save translation in al mov ax, offset drawing_banner call print_string pop ax mov dirty_flag, 1 ; set the dirty flag greek_check: test al, GREEK_CHAR jz nothing_check push ax ; save translation in al mov ax, offset greek_banner call print_string pop ax mov dirty_flag, 1 ; set the dirty flag nothing_check: cmp dirty_flag, 0 ; was anything found? je print_nothing_banner jmp outer_loop print_nothing_banner: mov ax, offset nothing_banner call print_string jmp outer_loop ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE you need to: link prog1+transtbl+\asmhelp ; The PC Assembler Tutor 258 ______________________ The program is long, but straightforward. Input a character and get its encoding. Test for each characteristic. If it is found, print the appropriate message and set the dirty_flag to indicate something was printed. At the end, if nothing was printed, print the failure message. Notice that the translation table is in ES and we are using a segment override for it. If you look at the EXTRN statement for 'translation_table', you will see that even though we are using ES, it is declared EXTRN in a segment with an: ASSUME ds:DATASTUFF statement. How can we get away with this? The assembler never deals with 'translation table' directly. The only thing it does is put the offset in BX. We put the segment override in ourselves with: xlat es:[bx] so the assembler never has to decide whether a segment override is necessary or which segment override to use. WORD SEARCH When doing the mock word search program in the chapter on string instructions, I mentioned that it really wouldn't cut the mustard when it comes to real word searches. Why? If we are looking for "when" we also want to find "When". If we are looking for " searches ", we also want to find " searches,", that is, punctuation should not interefere unless we want it to, and capitals should not interefere unless we want them to. With the aid of a translation table, we will make a word search program which uses the following rules. In the SEARCH string (the string that defines what you are looking for): (1) Any small letter will match either a small or large letter. (2) A capital letter will match only a capital letter. (3) A blank will match any whitespace or punctuation. (4) A punctuation mark will only match itself. With these rules "Why" must start with a capital 'W' to be a match, but 'h' and 'y' may be either capital or small. " some," may have any whitespace (including a carriage return) in front, but must hava a comma ',' at the end. This program has two data files. \XTRAFILE\SRCHTBL.OBJ contains the translation table. It is called "wordsearch_table" and is in DATASTUFF, so will be in our normal DS segment. In order to have text to search I have included an object file that is the text of a chapter from a book. (The object file text includes carriage returns). The text is a C string - it is terminated by a 0. The book was written by C.D. Huffam, and is the autobiographical account of his dual life as a writer and lecturer. The book is Chapter 23 - Xlat 259 _________________ called "A Tale of Two C.D.s". The object file with the text is \XTRAFILE\TWOTALE.OBJ. It is in a private segment and will use ES as a segment register. There is also a straight text file which you can print out so you can see what is in the object file. It is \XTRAFILE\TWOTALE.DOC. Here's the program. The explaination is at the end. ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE EXTRN tale_text:BYTE, wordsearch_table:BYTE entry_message db 13,10, "Enter a word for a word search", 0 no_match_message db "There was no match", 0 input_buffer db 80 dup (?) text_file_length dw ? letter_count dw ? ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE ; find the length of the text file mov ax, seg tale_text ; load es register mov es, ax mov di, offset tale_text ; offset to di mov bx, di ; copy to bx mov al, 0 ; try to match zero cld ; clear DF (increment) string_end_loop: scasb ; search for zero jne string_end_loop dec di ; one too many , so decrement sub di, bx ; finish - start = length mov text_file_length, di ; length of text_file big_loop: ; get a word for the word search mov ax, offset entry_message call print_string mov ax, offset input_buffer call get_string ; find the end of string mov al, 0 ; compare with 0 mov bx, offset input_buffer mov cx, 0 ; letter count letter_count_loop: cmp al, [bx] ; compare to 0 je end_of_count_loop inc cx ; increment count inc bx ; increment pointer jmp letter_count_loop end_of_count_loop: cmp cx, 0 ; if 0, string is empty The PC Assembler Tutor 260 ______________________ je big_loop ; so start again mov letter_count, cx ; look for word match. In this program, the text string ; is referenced by si and the search string is referenced ; by di. mov si, offset tale_text mov cx, text_file_length ; length of file sub cx, letter_count ; last possible match inc cx ; +1 for boundary condition ; set up translation table ( it is in DATASTUFF ) mov bx, offset wordsearch_table word_search_loop: push si ; save a copy push cx ; save a copy mov di, offset input_buffer mov cx, letter_count letter_loop: mov al, es:[si] ; text to al cmp al, [di] ; same as search string? je next_letter xlat ; if not, translate cmp al, [di] ; allowable substitute? jne new_start ; if not, start at new place next_letter: inc di ; move to next letter inc si loop letter_loop ; we fell through, so we found a complete match jmp found_it ; no match. are we finished? new_start: pop cx pop si inc si ; move to next character loop word_search_loop ; we fell through. finished, but no match mov ax, offset no_match_message call print_string jmp big_loop found_it: pop cx ; take cx off the stack pop si ; start of the match ; move 25 characters to buffer for printing mov di, offset input_buffer mov cx, 25 Chapter 23 - Xlat 261 _________________ character_move: mov al, es:[si] mov [di], al inc si ; increment pointers inc di loop character_move mov BYTE PTR [di], 0 ; end of string mov ax, offset input_buffer call print_string jmp big_loop ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE You need to: link prog2+twotale+srchtbl+\asmhelp ; to get asmhelp and the two data files in the program. This program is very similar to the search program in the chapter on strings. However, because of where the files are, the pointers have been changed around. Therefore, it is safer if you simply cut out the program with a word processor and paste it into the template file rather than try to modify the prevoius search program.{2} It is assumed that you did the string match program. The logic is the same and will not be covered again. First we input a search string. Then starting at the beginning of the text to be search we check till we find the first match. If we find a match, we print out 25 characters starting with the first character of the match. If no match is found, a message to that effect is printed. The character match is a two step process. The character from the text is put in AL. It is compared with the search character for an EXACT match. If they match, we are done. If not, we use XLAT on AL (the character from the text) which will translate to its allowable substitute. In fact, all this is just: (1) all capital letters become small, (2) all punctuation becomes spaces, and (3) all whitespace becomes spaces. Once again, we compare AL with the search character. If we have a match, ok. If not, we start over. The text is in ES, the translation table is in DS, so it is inconvenient to use the string instructions in this program. Try to match a word at the beginning of the line, end of the line, with and without punctuation and with and without capitals. If you go across a line break, you need to substitute two blanks in the search string for CRLF (13,10). ____________________ 2. You should understand what is going on in the code before you run these programs. I didn't write the code for myself, I wrote it for you. If you run it but don't understand it, it won't help you a bit. The PC Assembler Tutor 262 ______________________ Suppose you are not interested in all 256 values of the translation table. Let's say that you only want to have a translation table for the numbers from 0 to 99. Can you still use this? Yes, but you need to put in some range checking to make sure that you have valid data. MAX_VALUE EQU 99 mov al, data_byte ; byte to al cmp al, MAX_VALUE ; too large? ja data_error ; report error xlat This insures that any data that is out of range is not translated. Therefore the translation table only needs to be 100 bytes long (0 - 99). If you want more than 256 elements in the translation table you need to use words, not bytes, and you cannot use XLAT. You can make your own code to do the same thing. MAX_VALUE EQU 999 my_translation_table dw 1000 dup (?) if you put the translation data into the table, you can then have the following code: mov bx, offset my_translation_table ; - - - - - translation block mov si, data_word ; word to si cmp si, MAX_VALUE ; too large? ja data_error shl si, 1 ; SI x 2 = number of BYTES into table mov ax, [bx+si] ; base + offset ; - - - - - end of translation block XLAT is about twice as fast as this last code, so when you have a choice always use XLAT. Chapter 23 - Xlat 263 _________________ SUMMARY XLAT BX holds the address of a 256 byte array called a translation table. AL holds the character to be translated. If x is the value in AL before XLAT, then after XLAT, AL=array[x].